Classification of Web Documents using Fuzzy Logic Categorical Data Clustering

نویسندگان

  • George E. Tsekouras
  • Christos Anagnostopoulos
  • Damianos Gavalas
  • Economou Dafhi
چکیده

We propose a categorical data fuzzy clustering algorithm to classify web documents. We extract a number of words for each thematic area (category) and then, we treat each word as a multidimensional categorical data vector. For each category, we use the algorithm to partition the available words into a number of clusters, where the center of each cluster corresponds to a word. To calculate the dissimilarity measure between two words we use the Hamming distance. Then, the classification of a new document is accomplished in two steps. Firstly, we estimate the minimum distance between this document and all the cluster centers of each category. Secondly, we select the smallest of the above minimum distance and we classify the document in the category that corresponds to this distance.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Web Document Clustering Using Fuzzy Equivalence Relations

Conventional clustering means classifying the given data objects as exclusive subsets (clusters).That means we can discriminate clearly whether an object belongs to a cluster or not. However such a partition is insufficient to represent many real situations. Therefore a fuzzy clustering method is offered to construct clusters with uncertain boundaries and allows that one object belongs to overl...

متن کامل

A New Approach to Classify Text based on CosFuzzy Logic

Objective type of Examination evaluation is easy in Computer world. But the descriptive type of question evaluation is more difficult and there is no significant research has been taken place. In this paper I propose a new solution to the above problem with text classification using the new fuzzy logic named CosFuzzy Logic. Document Clustering is a useful technique that organizes a large quanti...

متن کامل

Optimization of a Search Engine for an Organized and Effective Browsing

In web search applications, queries are submitted to search engines to represent the information needs of users. Discovering the number of diverse user search goals for a query and depicting each goal with some keywords automatically. In the existing work propose a novel approach to infer user search goals by analyzing search engine query logs. First propose a novel approach to infer user searc...

متن کامل

Modified Particle Swarm Optimization Based Adaptive Fuzzy K-Modes Clustering for Heterogeneous Medical Databases

The main purpose of data mining is to extract hidden predictive knowledge of useful information and patterns of data from large databases for utilizing it in decision support. Medical field has large amount of various heterogeneous databases, in which the extraction of hidden useful knowledge for the classification of data is difficult one. In order to cluster and classify the whole databases o...

متن کامل

Using Fuzzy Logic Clustering Discover Semantic Similarity in Web Document

The complex and high interactions between terms in documents demonstrates vague and ambiguous meanings. There exist complicated associations within one web document and linking to the others. Most of these approaches perform similarity and feature section methods. There is need of complex document clustering and produced meaningful document. This paper proposed methodology is capable of handles...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2007